{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ur8xi4C7S06n"
      },
      "outputs": [],
      "source": [
        "# Copyright 2024 Google LLC\n",
        "#\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#     https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JAPoU8Sm5E6e"
      },
      "source": [
        "# Intro to Building a Scalable and Modular RAG System with RAG Engine in Vertex AI \n",
        "\n",
        "<table align=\"left\">\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/rag-engine/intro_rag_engine.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Frag-engine%2Fintro_rag_engine.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/rag-engine/intro_rag_engine.ipynb\">\n",
        "      <img src=\"https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/rag-engine/intro_rag_engine.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://raw.githubusercontent.com/primer/octicons/refs/heads/main/icons/mark-github-24.svg\" alt=\"GitHub logo\"><br> View on GitHub\n",
        "    </a>\n",
        "  </td>\n",
        "</table>\n",
        "\n",
        "<div style=\"clear: both;\"></div>\n",
        "\n",
        "<b>Share to:</b>\n",
        "\n",
        "<a href=\"https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/rag-engine/intro_rag_engine.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg\" alt=\"LinkedIn logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/rag-engine/intro_rag_engine.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg\" alt=\"Bluesky logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/rag-engine/intro_rag_engine.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg\" alt=\"X logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/rag-engine/intro_rag_engine.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" alt=\"Reddit logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/rag-engine/intro_rag_engine.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg\" alt=\"Facebook logo\">\n",
        "</a>            "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "84f0f73a0f76"
      },
      "source": [
        "| Author |\n",
        "| --- |\n",
        "| [Holt Skinner](https://github.com/holtskinner) |"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tvgnzT1CKxrO"
      },
      "source": [
        "## Overview\n",
        "\n",
        "Retrieval Augmented Generation (RAG) improves Large Language Models (LLMs) by allowing them to access and process external information sources during generation. This ensures the model's responses are grounded in factual data and avoids hallucinations.\n",
        "\n",
        "A common problem with LLMs is that they don't understand private knowledge, that\n",
        "is, your organization's data. With RAG Engine, you can enrich the\n",
        "LLM context with additional private information, because the model can reduce\n",
        "hallucinations and answer questions more accurately.\n",
        "\n",
        "By combining additional knowledge sources with the existing knowledge that LLMs\n",
        "have, a better context is provided. The improved context along with the query\n",
        "enhances the quality of the LLM's response.\n",
        "\n",
        "The following concepts are key to understanding Vertex AI RAG Engine. These concepts are listed in the order of the\n",
        "retrieval-augmented generation (RAG) process.\n",
        "\n",
        "1. **Data ingestion**: Intake data from different data sources. For example,\n",
        "  local files, Google Cloud Storage, and Google Drive.\n",
        "\n",
        "1. **Data transformation**: Conversion of the data in preparation for indexing. For example, data is split into chunks.\n",
        "\n",
        "1. **Embedding**: Numerical representations of words or pieces of text. These numbers capture the\n",
        "   semantic meaning and context of the text. Similar or related words or text\n",
        "   tend to have similar embeddings, which means they are closer together in the\n",
        "   high-dimensional vector space.\n",
        "\n",
        "1. **Data indexing**: RAG Engine creates an index called a corpus.\n",
        "   The index structures the knowledge base so it's optimized for searching. For\n",
        "   example, the index is like a detailed table of contents for a massive\n",
        "   reference book.\n",
        "\n",
        "1. **Retrieval**: When a user asks a question or provides a prompt, the retrieval\n",
        "  component in RAG Engine searches through its knowledge\n",
        "  base to find information that is relevant to the query.\n",
        "\n",
        "1. **Generation**: The retrieved information becomes the context added to the\n",
        "  original user query as a guide for the generative AI model to generate\n",
        "  factually grounded and relevant responses.\n",
        "\n",
        "For more information, refer to the public documentation for [Vertex AI RAG Engine](https://cloud.google.com/vertex-ai/generative-ai/docs/rag-overview)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "61RBz8LLbxCR"
      },
      "source": [
        "## Get started"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "No17Cw5hgx12"
      },
      "source": [
        "### Install Vertex AI SDK and Google Gen AI SDK\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "tFy3H3aPgx12"
      },
      "outputs": [],
      "source": [
        "%pip install --upgrade --quiet google-cloud-aiplatform google-genai"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dmWOrTJ3gx13"
      },
      "source": [
        "### Authenticate your notebook environment (Colab only)\n",
        "\n",
        "If you're running this notebook on Google Colab, run the cell below to authenticate your environment."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "NyKGtVQjgx13"
      },
      "outputs": [],
      "source": [
        "import sys\n",
        "\n",
        "if \"google.colab\" in sys.modules:\n",
        "    from google.colab import auth\n",
        "\n",
        "    auth.authenticate_user()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DF4l8DTdWgPY"
      },
      "source": [
        "### Set Google Cloud project information and initialize Vertex AI SDK\n",
        "\n",
        "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n",
        "\n",
        "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).\n",
        "\n",
        "See [supported regions](https://cloud.google.com/vertex-ai/generative-ai/docs/rag-engine/rag-overview#supported-regions) for location options.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Nqwi-5ufWp_B"
      },
      "outputs": [],
      "source": [
        "# Use the environment variable if the user doesn't provide Project ID.\n",
        "import os\n",
        "\n",
        "import vertexai\n",
        "from google import genai\n",
        "\n",
        "# fmt: off\n",
        "PROJECT_ID = \"[your-project-id]\"  # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n",
        "# fmt: on\n",
        "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n",
        "    PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n",
        "\n",
        "# See https://cloud.google.com/vertex-ai/generative-ai/docs/rag-engine/rag-overview#supported-regions for location options.\n",
        "vertexai.init(project=PROJECT_ID, location=\"us-east4\")\n",
        "client = genai.Client(vertexai=True, project=PROJECT_ID, location=\"global\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5303c05f7aa6"
      },
      "source": [
        "### Import libraries"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "6fc324893334"
      },
      "outputs": [],
      "source": [
        "from IPython.display import Markdown, display\n",
        "from google.genai.types import GenerateContentConfig, Retrieval, Tool, VertexRagStore\n",
        "from vertexai import rag"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "e43229f3ad4f"
      },
      "source": [
        "### Create a RAG Corpus"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "cf93d5f0ce00"
      },
      "outputs": [],
      "source": [
        "# Currently supports Google first-party embedding models\n",
        "# fmt: off\n",
        "EMBEDDING_MODEL = \"publishers/google/models/text-embedding-005\"  # @param {type:\"string\", isTemplate: true}\n",
        "# fmt: on\n",
        "\n",
        "rag_corpus = rag.create_corpus(\n",
        "    display_name=\"my-rag-corpus\",\n",
        "    backend_config=rag.RagVectorDbConfig(\n",
        "        rag_embedding_model_config=rag.RagEmbeddingModelConfig(\n",
        "            vertex_prediction_endpoint=rag.VertexPredictionEndpoint(\n",
        "                publisher_model=EMBEDDING_MODEL\n",
        "            )\n",
        "        )\n",
        "    ),\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "197c585b61b2"
      },
      "source": [
        "### Check the corpus just created"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "f229b13dc617"
      },
      "outputs": [],
      "source": [
        "rag.list_corpora()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "c52924cc1440"
      },
      "source": [
        "### Upload a local file to the corpus"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "4976ffe8564f"
      },
      "outputs": [],
      "source": [
        "%%writefile test.md\n",
        "\n",
        "Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by allowing them to access and incorporate external data sources when generating responses. Here's a breakdown:\n",
        "\n",
        "**What it is:**\n",
        "\n",
        "* **Combining Retrieval and Generation:**\n",
        "    * RAG combines the strengths of information retrieval systems (like search engines) with the generative power of LLMs.\n",
        "    * It enables LLMs to go beyond their pre-trained data and access up-to-date and specific information.\n",
        "* **How it works:**\n",
        "    * When a user asks a question, the RAG system first retrieves relevant information from external data sources (e.g., databases, documents, web pages).\n",
        "    * This retrieved information is then provided to the LLM as additional context.\n",
        "    * The LLM uses this augmented context to generate a more accurate and informative response.\n",
        "\n",
        "**Why it's helpful:**\n",
        "\n",
        "* **Access to Up-to-Date Information:**\n",
        "    * LLMs are trained on static datasets, so their knowledge can become outdated. RAG allows them to access real-time or frequently updated information.\n",
        "* **Improved Accuracy and Factual Grounding:**\n",
        "    * RAG reduces the risk of LLM \"hallucinations\" (generating false or misleading information) by grounding responses in verified external data.\n",
        "* **Enhanced Contextual Relevance:**\n",
        "    * By providing relevant context, RAG enables LLMs to generate more precise and tailored responses to specific queries.\n",
        "* **Increased Trust and Transparency:**\n",
        "    * RAG can provide source citations, allowing users to verify the information and increasing trust in the LLM's responses.\n",
        "* **Cost Efficiency:**\n",
        "    * Rather than constantly retraining large language models, RAG allows for the introduction of new data in a more cost effective way.\n",
        "\n",
        "In essence, RAG bridges the gap between the vast knowledge of LLMs and the need for accurate, current, and contextually relevant information.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "529390917c29"
      },
      "outputs": [],
      "source": [
        "rag_file = rag.upload_file(\n",
        "    corpus_name=rag_corpus.name,\n",
        "    path=\"test.md\",\n",
        "    display_name=\"test.md\",\n",
        "    description=\"my test file\",\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5269a0c2786d"
      },
      "source": [
        "### Import files from Google Cloud Storage\n",
        "\n",
        "Remember to grant \"Viewer\" access to the \"Vertex RAG Data Service Agent\" (with the format of `service-{project_number}@gcp-sa-vertex-rag.iam.gserviceaccount.com`) for your Google Cloud Storage bucket.\n",
        "\n",
        "For this example, we'll use a public GCS bucket containing earning reports from Alphabet."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "5910ae450f69"
      },
      "outputs": [],
      "source": [
        "INPUT_GCS_BUCKET = (\n",
        "    \"gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/\"\n",
        ")\n",
        "\n",
        "response = rag.import_files(\n",
        "    corpus_name=rag_corpus.name,\n",
        "    paths=[INPUT_GCS_BUCKET],\n",
        "    # Optional\n",
        "    transformation_config=rag.TransformationConfig(\n",
        "        chunking_config=rag.ChunkingConfig(chunk_size=1024, chunk_overlap=100)\n",
        "    ),\n",
        "    max_embedding_requests_per_min=900,  # Optional\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "60a84095746d"
      },
      "source": [
        "### Import files from Google Drive\n",
        "\n",
        "Eligible paths can be formatted as:\n",
        "\n",
        "- `https://drive.google.com/drive/folders/{folder_id}`\n",
        "- `https://drive.google.com/file/d/{file_id}`.\n",
        "\n",
        "Remember to grant \"Viewer\" access to the \"Vertex RAG Data Service Agent\" (with the format of `service-{project_number}@gcp-sa-vertex-rag.iam.gserviceaccount.com`) for your Drive folder/files.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0a90c125874c"
      },
      "outputs": [],
      "source": [
        "response = rag.import_files(\n",
        "    corpus_name=rag_corpus.name,\n",
        "    paths=[\"https://drive.google.com/drive/folders/{folder_id}\"],\n",
        "    # Optional\n",
        "    transformation_config=rag.TransformationConfig(\n",
        "        chunking_config=rag.ChunkingConfig(chunk_size=512, chunk_overlap=50)\n",
        "    ),\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "f700b3e23121"
      },
      "source": [
        "### Optional: Perform direct context retrieval"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "4669c5cdbb5a"
      },
      "outputs": [],
      "source": [
        "# Direct context retrieval\n",
        "response = rag.retrieval_query(\n",
        "    rag_resources=[\n",
        "        rag.RagResource(\n",
        "            rag_corpus=rag_corpus.name,\n",
        "            # Optional: supply IDs from `rag.list_files()`.\n",
        "            # rag_file_ids=[\"rag-file-1\", \"rag-file-2\", ...],\n",
        "        )\n",
        "    ],\n",
        "    rag_retrieval_config=rag.RagRetrievalConfig(\n",
        "        top_k=10,  # Optional\n",
        "        filter=rag.Filter(\n",
        "            vector_distance_threshold=0.5,  # Optional\n",
        "        ),\n",
        "    ),\n",
        "    text=\"What is RAG and why it is helpful?\",\n",
        ")\n",
        "print(response)\n",
        "\n",
        "# Optional: The retrieved context can be passed to any SDK or model generation API to generate final results.\n",
        "# context = \" \".join([context.text for context in response.contexts.contexts]).replace(\"\\n\", \"\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "79ea89661842"
      },
      "source": [
        "### Create RAG Retrieval Tool"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0ebceac3d816"
      },
      "outputs": [],
      "source": [
        "# Create a tool for the RAG Corpus\n",
        "rag_retrieval_tool = Tool(\n",
        "    retrieval=Retrieval(\n",
        "        vertex_rag_store=VertexRagStore(\n",
        "            rag_corpora=[rag_corpus.name],\n",
        "            similarity_top_k=10,\n",
        "            vector_distance_threshold=0.5,\n",
        "        )\n",
        "    )\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "d88fa7ede853"
      },
      "source": [
        "### Generate Content with Gemini using RAG Retrieval Tool"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "8dd928baecd4"
      },
      "outputs": [],
      "source": [
        "MODEL_ID = \"gemini-2.0-flash-001\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "124b36be8d5b"
      },
      "outputs": [],
      "source": [
        "response = client.models.generate_content(\n",
        "    model=MODEL_ID,\n",
        "    contents=\"What is RAG?\",\n",
        "    config=GenerateContentConfig(tools=[rag_retrieval_tool]),\n",
        ")\n",
        "\n",
        "display(Markdown(response.text))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0268fe43d41c"
      },
      "source": [
        "### Generate Content with Llama3 using RAG Retrieval Tool"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "f6e67ee7968c"
      },
      "outputs": [],
      "source": [
        "from vertexai import generative_models\n",
        "\n",
        "# Load tool into Llama model\n",
        "rag_retrieval_tool = generative_models.Tool.from_retrieval(\n",
        "    retrieval=rag.Retrieval(\n",
        "        source=rag.VertexRagStore(\n",
        "            rag_resources=[rag.RagResource(rag_corpus=rag_corpus.name)],\n",
        "            rag_retrieval_config=rag.RagRetrievalConfig(\n",
        "                top_k=10,  # Optional\n",
        "                filter=rag.Filter(\n",
        "                    vector_distance_threshold=0.5,  # Optional\n",
        "                ),\n",
        "            ),\n",
        "        ),\n",
        "    )\n",
        ")\n",
        "\n",
        "llama_model = generative_models.GenerativeModel(\n",
        "    # your self-deployed endpoint for Llama3\n",
        "    \"projects/{project}/locations/{location}/endpoints/{endpoint_resource_id}\",\n",
        "    tools=[rag_retrieval_tool],\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "c6d710b6dece"
      },
      "outputs": [],
      "source": [
        "response = llama_model.generate_content(\"What is RAG?\")\n",
        "\n",
        "display(Markdown(response.text))"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "name": "intro_rag_engine.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
